[LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition#22778
Conversation
…ionOp -> LinalgExt::AttentionOp Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
MaheshRavishankar
left a comment
There was a problem hiding this comment.
I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.
I was going to do that, but then I saw Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ? |
I dont know the history of that, but we probably need to drop the old usage and just add an optional attribute here. @Groverkss comments? |
I think it's okay to have a decomposition config dictionary and add these attributes to it. Attention op needs multiple configuration options so it's useful to have a dictionary. |
compiler/src/iree/compiler/Dialect/LinalgExt/IR/test/decompose_aggregate_op.mlir
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/DecomposeAttention.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td
Outdated
Show resolved
Hide resolved
|
@MaheshRavishankar Can you have a look at this again? |
keshavvinayak01
left a comment
There was a problem hiding this comment.
Let's re-run CI on this and get it merged? @Groverkss
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
…t-torch-rewrite-flexattention'
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
These should run on shark75-ci.
…pass SwizzleHintOps (#23084) This is the second of a series of PRs that together implement support in IREE for XOR swizzling through the SwizzleHintOp. There are four PRs that need to be merged: 1) Allow rank > 1 swizzle hint op operands and add a pass to flatten swizzle hint allocs. 2) Add patterns which can fold reshapes and `extract_slice` ops into empty ops through swizzle hint ops. 3) Add swizzle hint attribute to be set in `lowering_config` and consumed in `GPUPromoteMatmulOperandsPass`. 4) Update `LLVMGPUSelectLoweringStrategy` Pass to set xor swizzles for MXFP4 GEMMs. This is PR 2, which does two things: - duplicates folding patterns for tensor.empty op from upstream llvm-project in IREE, but with support for swizzle hint ops. - Adds these patterns to the `GPUApplyTilingPass`. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
This is the first of a series of PRs that together implement support in IREE for XOR swizzling through the SwizzleHintOp. There are four PRs that need to be merged: 1) Allow rank > 1 swizzle hint op operands and add a pass to flatten swizzle hint allocs. 2) Add patterns which can fold reshapes and `extract_slice` ops into empty ops through swizzle hint ops. 3) Add swizzle hint attribute to be set in `lowering_config` and consumed in `GPUPromoteMatmulOperandsPass`. 4) Update `LLVMGPUSelectLoweringStrategy` Pass to set xor swizzles for MXFP4 GEMMs. This is PR 1, which does three things: - Loosens the restriction on SwizzleHintOp inputs needing to be a Shaped type of rank 1. We do this because things are a lot simpler during tiling when you can fold arbitrary shapes into the swizzle hint op and then flatten later. - Introduces a pass to flatten allocs associated to `SwizzleHintOps`. - Moves the verification of flatness of swizzle hint ops to the `ResolveSwizzleHintOps` pass, prior to removal. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
You can enable it with `-DIREE_REVERSE_ITERATION=On`. I found 4 failing tests but there might be more non-determinism. ``` iree/compiler/Dialect/Stream/Transforms/test/automatic_reference_counting.mlir iree/compiler/Dialect/Stream/Transforms/test/automatic_reference_counting_scf.mlir iree/compiler/Dialect/Util/Transforms/test/hoist_into_globals.mlir iree/compiler/GlobalOptimization/test/hoist_into_globals.mlir ``` Once fixed, I plan to enable this in CI.
Pass booleans instead of `nullptr`; the former confuses some compilers because both `bool` and `Value` are constructible with `nullptr`. Also clean up comments and needlessly complicated code just above. Fixes: #23164
… modified (#23168) * Updates torch_ops configuration file to skip running some tests (new tests added without golden_value and a new failing that was not skipped). * Adds a new rule to configure_ci.py to run torch tests whenever configuration files are modified. This is because otherwise one needs to remember to add ci-extra to test relevant tests. (onnx and sharktank are not included here since they are always run on pre-submit)
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Adds a pass to remove iree_codegen.index_hint operations. The pass unconditionally drops all index_hint ops, and should be used once the compiler is done using them for optimizations. The ops can get in the way of later optimizations, so this pass should be used to drop them once they are no longer needed. The pass is not added to any pipelines, because we are not generating index_hint ops anywhere yet, but this pass will be added later once index_hints start to be used. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Enable tests that were previously excluded but now pass: ROCM/HIP (tests/e2e/linalg): - conv2d, narrow_n_matmuls, subbyte_to_fp, fp_to_subbyte, fp4_f32_conversion, index VMVX (tests/e2e/linalg): - argmax, index VMVX (tests/e2e/linalg_ext_ops): - attention Vulkan (tests/e2e/linalg): - argmax, index Vulkan (tests/e2e/linalg_ext_ops): - map_gather, map_scatter, top-k Vulkan (tests/e2e/stablehlo_ops): - reverse Below is the additional testing time on my machine (using gfx1100): ``` ● Test execution times for newly enabled tests: ┌──────────┬───────┬────────────┐ │ Backend │ Tests │ Total Time │ ├──────────┼───────┼────────────┤ │ ROCM/HIP │ 6 │ 3.06 sec │ ├──────────┼───────┼────────────┤ │ VMVX │ 3 │ 0.28 sec │ ├──────────┼───────┼────────────┤ │ Vulkan │ 6 │ 0.58 sec │ ├──────────┼───────┼────────────┤ │ Total │ 15 │ ~3.9 sec │ └──────────┴───────┴────────────┘ Individual test breakdown: ROCM/HIP: - conv2d: 0.28s - fp4_f32_conversion: 0.39s - fp_to_subbyte: 0.43s - index: 0.27s - narrow_n_matmuls: 0.97s - subbyte_to_fp: 0.72s VMVX: - argmax: 0.04s - index: 0.04s - attention: 0.20s Vulkan: - argmax: 0.05s - index: 0.05s - map_gather: 0.13s - map_scatter: 0.12s - top-k: 0.19s - reverse: 0.05s All tests are fast (under 1 second each). The slowest is narrow_n_matmuls on ROCM at ~1 second. ``` Signed-off-by: hanhanW <hanhan0912@gmail.com>
Injects iree_codegen.index_hint ops on offsets in the populateOperandOffsetsSizesStrides functions for MMAAttrs. We inject the hints here, because the semantic information about the offsets is readily available, and can easily carry down to the later optimization pass that converts loads into transpose loads using these hints. These hints are intended for load to transpose load optimizations, but they are set unconditionally regardless of transpositions for simplicity. The later optimization pass is responsible for determining when the loads are transposed, since it is more explicit at that point. The hint ops will be dropped right after LLVMGPULowerExecutableTarget, since at that point the index_hint ops should already have been used. Currently, the pass that consumes these hint ops is not enabled, so the hint ops will be doing nothing until the pass is added. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
I don't want to add too many CI workflows, so adding together with ubsan.
This is hard to test for because only the (dynamic) host feature list is unordered, unlike features for a specific target, and we can't assume a specific host in tests.
* Use `llvm::IsaPred<T>` instead of lambdas where possible * `!any_of` --> `none_of`
8b4a278 to
b47101f
Compare
|
What happened here. Why did you close this? |
|
I was trying to rebase and push to trigger CI, but the git history got messed up. So I re-opened it #23211 |
|
Next time, you can get to Merging PR like that is introducing overheads for future code tracking, IMO. You force people clicking into many links to figure out old review comments and the reason of making changes. |
Following the discussion from #22441
Depending on the backend, certain computations may benefit from directly using
expinstead ofexp2, since there might be accuracy losses due to FP-reassociation. It's helpful to add flag incase the user tracks losses to this particular computation and might favour directly usingexp.The
use_exp2flag is mostly unused in dialect conversions and passes, I presume it's used as a KernelOption. The changes here will not modify the default behavior.